-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit session state during metadata queries in Iceberg #19757
Conversation
Let it differentiate between "a"."b.c" and "a.b"."c" tables.
a1ab9d5
to
2c3e111
Compare
Metadata queries such as `information_schema.columns`, `system.jdbc.columns` or `system.metadata.table_comments` may end up loading arbitrary number of relations within single query (transaction). It is important to bound memory usage for such queries. In case of Iceberg Hive metastore based catalog, this is already done in `TrinoHiveCatalogFactory` bu means of configuring per-query `CachingHiveMetastore`. However, catalogs with explicit caching need something similar.
2c3e111
to
ad51682
Compare
@@ -689,6 +690,7 @@ public void dropCorruptedTable(ConnectorSession session, SchemaTableName schemaT | |||
} | |||
String tableLocation = metadataLocation.replaceFirst("/metadata/[^/]*$", ""); | |||
deleteTableDirectory(fileSystemFactory.create(session), schemaTableName, tableLocation); | |||
invalidateTableCache(schemaTableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be moved to dropTableFromMetastore
or even to deleteTable
to simplify code and prevent from omitting in case new functions in future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought about it. technically it would work, but i considered dropTableFromMetastore being just a technical operation, which may or may not be invoked, or be the last operation as part of the drop flow
@@ -365,6 +365,7 @@ private static Optional<String> getQueryId(io.trino.plugin.hive.metastore.Table | |||
public void unregisterTable(ConnectorSession session, SchemaTableName schemaTableName) | |||
{ | |||
dropTableFromMetastore(schemaTableName); | |||
invalidateTableCache(schemaTableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this and folowing be moved to dropTableFromMetastore
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No release note entry @findepi ? |
My understanding is that the caching here is important for ensuring that queries use the same snapshot in different phases of query execution. It's unlikely that a regular select would fill this cache but it'd be nice if we could have them be unbounded when necessary. |
I see your point and agree. We probably should also update the JDBC connector. |
Metadata queries such as
information_schema.columns
,system.jdbc.columns
orsystem.metadata.table_comments
may end uploading arbitrary number of relations within single query (transaction).
It is important to bound memory usage for such queries.
In case of Iceberg Hive metastore based catalog, this is already done in
TrinoHiveCatalogFactory
bu means of configuring per-queryCachingHiveMetastore
. However, catalogs with explicit caching needsomething similar.